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Abstract. News reports of the last few years indicated that several intelligence agen¬ 
cies are able to monitor large networks or entire portions of the Internet backbone. 
Such a powerful adversary has only recently been considered by the academic litera¬ 
ture. 

In this paper, we propose a new adversary model for Location Based Services (LBSs). 
The model takes into account an unauthorized third party, different from the LBS 
provider itself, that wants to infer the location and monitor the movements of a LBS 
user. We show that such an adversary can extrapolate the position of a target user by 
just analyzing the size and the timing of the encrypted traffic exchanged between that 
user and the LBS provider. We performed a thorough analysis of a widely deployed 
location based app that comes pre-installed with many Android devices: GoogleNow. 
The results are encouraging and highlight the importance of devising more effective 
countermeasures against powerful adversaries to preserve the privacy of LBS users. 

Keywords: location-based services, network traffic analysis, GoogleNow, privacy, 
mobile devices 


1 Introduction 

Modern surveillance systems that track the movements of cellphone users are more sophis¬ 
ticated than ever. Intelligence agencies can easily locate the cell tower used by a target 
and find his location. According to a Washington Post article [5], it is possible to exploits 
security vulnerabilities in the network used by mobile carriers around the world to provide 
services to their traveling customers. This network is called the Signaling System 7 (SS7) 
and once access to it is obtained, it is possible to track the location of anyone in the world 
and learn whether a person is walking down a specific street, driving, or taking a flight. 
When the approximate location of a target is known, stingrays [3] (or fake towers) can be 
used to redirect calls, monitor Internet traffic, steal phone’s data, and even install malware. 

These attacks, however, are active and (partially) intrusive. In particular, they are not 
completely stealthy and leave traces. Indeed, queries to the SS7 network can be logged and 
phones could be configured by experts to detect stingrays (e.g., IMSI-Catcher Detector on 
Android phones). In addition, these attacks can only be carried out in cooperation with 
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vendors, mobile carriers, or ISPs. Often the target is in a foreign country and special per¬ 
missions or agreements must be in place to be able to track his movements. 

In this paper we show that it is possible to locate the position of a cellphone by sim¬ 
ply monitoring the traffic of certain phone applications that provide location-based services 
(LBS). Clearly this is simple if the traffic is in the clear, but our main contribution is to 
show that it is possible to track users even when the traffic is properly encrypted. We be¬ 
lieve our method will have significant implications in the way location-based services are 
provided. LBSs are often accessed through apps that will be referred to as Location Based 
Apps or LBAs. LBAs are used to find friends and restaurants nearby, to locate points of 
interest, to check public transport timetables and even to search for deals or special offers. 
Several physical retailers (e.g., Best Buy, Kohls), also deploy location-based promotions to 
push notifications while the consumer is in or near the store. TripAdvisor, Booking.com and 
weather forecasting applications are other examples of LBAs. 

It is difficult to protect the privacy of users while at the same time provide useful LBSs. 
It is possible to obfuscate the exact position of a user but these obfuscation techniques are 
rarely adopted by vendors (location data is too valuable to them). Moreover, customers 
appreciate services or information they receive and do not seem concerned about sharing 
their location data with LBS providers. 

The contribution of our paper is to show that any third party can infer a user’s position 
by just analyzing the encrypted traffic from that user to the LBS provider. This can be 
performed in a non-intrusive way, without leaving any traces. For instance, an intelligence 
organization could monitor routers belonging to some Autonomous System (AS) traversed 
by LBS’s packets and this would be enough to infer a target’s position in a foreign country 
without involving that country’s ISPs, mobile carriers, or any other local entities. Encryp¬ 
tion or NAT’d addresses do not help much in this scenario. Indeed, we leverage results of 
previous works on analysis on encrypted traffic which already highlighted the possibility of 
identifying apps installed on a device [11,22], or the presence of a specific user within a 
network [23]. 

Another important point to consider in this context is that current LBAs have started 
to adopt push technology solutions to send “the right information at the right time” [14]. 
This is the maxim of the GoogleNow app which comes preinstalled on most Android devices 
and provides several services that are tied to the user’s position. Several LBAs that come 
preinstalled on a phone do not even ask permission to use location data. The adoption of 
push technology implies that the user is continuously tracked by the LBS provider, even 
when the app runs in the background [6]. 


Contributions. In this work we put forward a new privacy problem related to LBSs. We 
introduce a new adversary model for LBSs and propose a technique that unauthorized third 
parties may use to infer the position of a target user. Furthermore, we analyze one of the most 
popular LBAs, GoogleNow, and we show that the analysis of its encrypted network traffic 
reveals the position of a user with high accuracy. This research is inevitably controversial. 
The method we developed could be used to undetectably monitor movements of users and 
abuse their privacy rights. However, it should be considered as a warning to the research 
community to spur more research in the area and come up with effective countermeasures. 
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Organization. The rest of this paper is organized as follows. In Sect. 2, we introduce the 
adversary model. In Sect. 3, we detail the different phases of the attack. In particular, 
Sect. 3.1 explains the method used by the adversary to collect the relevant data from the 
LBS provider. Then, Sect. 3.2 details the data analysis approach that can be used to infer 
user locations. Sect. 4 and 4.4 respectively report on the results achieved when analyzing 
GoogleNow, and on the strategy to select the points that should be monitored by the at¬ 
tacker. In Sect. 5, we review previous work. Finally, in Sect. 6 we draw the conclusions and 
discuss some possible future works. 


2 The Adversary Model 

We assume the existence of an adversary A that can sniff the network traffic of a mobile 
device. The adversary does not need to intercept the entire network traffic but just the 
packets that are exchanged between the LBA and the LBS provider. The adversary may 
do so by compromising one of the network devices of any AS that routes the information 
between the mobile device and the LBS. We assume that the adversary does not want to 
be detected, and therefore he does not compromise the mobile device nor change the con¬ 
tent of network packets. It is well-known that the NSA can identify users around the world 
of specific services (such as TOR) by detecting packet “fingerprints” and monitoring large 
portions of the Internet. NSA accomplishes this by collaborating mainly with US telecoms 
firms under various programs [4]. 

We assume that the adversary is able to identify and isolate the network traffic of the 
user he is interested in, and, among those packets, he is able to identify and isolate packets 
that are generated by the LBA. The adversary is able to determine where discrete com¬ 
munications begin and end (such as the download of updated information from the LBS). 
This is possible, for example, by observing typical communication patterns of the LBA. 
Note that if the network traffic is not encrypted, then our adversary may trivially inspect 
the packet content and determine the location of the mobile device. Therefore, we assume 
that the network traffic exchanged between the mobile device and the LBS is encrypted via 
SSL/TLS. Furthermore, we assume that the LBS provider does not use any mechanisms to 
protect the privacy of its users, such as k-anonymity cloaking, etc. This assumption is based 
on the fact that current LBS providers do not implement these mechanisms. 

To launch the traffic analysis attack that we consider in this paper, the adversary must 
build a knowledge base that summarizes the network traffic exchanged between the LBA 
and the LBS when the mobile device is located in certain locations of interest. We assume 
that the adversary can collect this data by using bogus accounts and virtual mobile devices. 

2.1 Example Scenario 

Figure 1 represents a possible attack scenario. The target U is a user with a mobile device; 
the adversary is a malicious entity A that is after the geographical location of U. In this 
scenario, A is not allowed to collude neither with the LBS nor with the ISP (the owner 
of the Base Transceiver Station - BTS) otherwise locating the user would be trivial (but 
intrusive) or even impossible if the LBS or the ISP refuse to provide private information. 
The LBS provider will keep sending new and relevant information to the device as the user 
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Fig. 1: Attack scenario 


moves around. This can happen without the user’s intervention via push communication. 
Several LBAs, more prominently GoogleNow, work this way. In this paper we analyze under 
which conditions collecting the encrypted traffic exchanged between the LBS provider and 
the device is enough to determine the exact user’s geographical position. Figure 1 depicts 
possible position in the network where A can sit to intercept the network traffic. This is 
represented by the router R y highlighted in red. However, it is worth mentioning that A 
can be potentially in any router laying in the path from the user’s device toward the LBS. 
Furthermore, those routers may belong to different ASs. Thus, the adversary may infer the 
user position from ASs different from the one of the user. He can even perform the attack 
from foreign countries without involving mobile carriers, or any other local entity. 


3 The Attack 

In this section, we detail the approach used by the adversary to infer the actual position of a 
target user. The entire approach can be logically divided into two steps: the data collection 
phase, and the candidate locations selection. The aim of the data collection phase is to 
collect enough information from the LBS provider to learn how different locations can be 
distinguished from each other. During this phase, the adversary builds up its knowledge base 
that will later be used to infer the most probable locations the target can be found in. For 
the sake of simplicity, we will assume that the LBS sends the same information to all users 
in the same location. Taking into account that the LBS may send personalized information 
to its users is left as future work. 

3.1 Data Collection 

Suppose that the adversary is interested in localizing users in a given area. First, the adver¬ 
sary logically divides the entire area in n subareas, and arbitrarily chooses a point in each 
subarea as a representative for that location. The size of the subareas is chosen according 
to the desired accuracy and the granularity of the information provided by the LBS. Hence, 
the adversary comes with a set of point locations £ = {Zi, I 2 , ■ ■ ■ l n }- 

Then, the adversary collects data from the LBS about all the locations in £. This can be 
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Table 1: Example of the adversary knowledge base, user dataset related to a user position 
during time, and possible guesses. 


[Adversary Knowledge Base 

LocID 

Bytes 

Timestamp 

1 

35780 

1399743000 

2 

30780 

1399743000 

* 

* 

* 

* 

* 

* 

1 

36780 

1399743060 

2 

30784 

1399743060 


1 User Dataset 1 

LocID 

Bytes 

Timestamp 

? 

35780 

1399743000 

? 

35780 

1399743020 

? 

36780 

1399743040 

? 

36780 

1399743060 

? 

30784 

1399743080 

? 

30784 

1399743100 


accomplished by using the same LBA of regular users and by spoofing the GPS coordinates 
pretending to be in each location Zj for 1 < i < n. The adversary periodically performs the 
procedure above to learn the traffic pattern of the LBS over time (e.g., data sent by the LBS 
may change according to daily or weekly trends). 

The network traffic that the adversary collects is used to build its knowledge base. The 
following steps are performed during this phase: 

Prefiltering: The network traffic is analyzed with a network protocol analyzer, and only 
the packets directed towards, or coming from, the network of the LBS are preserved. 
Knowledge Base Record Composition: For each location l t that is monitored, the ad¬ 
versary adds a record in its knowledge base composed of the following fields: 

— Location ID (LocID for brevity in the following): An identifier of the probed location 
li G C. 

— Bytes: The total size in bytes of the transmitted and received encrypted packets that 
belong to the same TLS/SSL session. 

— Timestamp: The timestamp of the first packet of the TLS/SSL session. 

An example of knowledge base that the adversary would create is reported on the left of 
Table 1. 


3.2 Selection of the Candidate Locations 

To track a user’s position, the adversary relies upon only two fields: the sum of the ex¬ 
changed bytes of a TLS/SSL session and the timestamp. This information is derived from 
the header of the packets, which is not encrypted by the SSL protocol. To learn the position 
of a given user U at time to, it is enough for the adversary to collect the communication 
traffic between the user’s LBA and the LBS between time to — t and to- For each TLS/SSL 
session, A calculates the fields described above and creates the user dataset. This is shown 
on the right of Table 1. 

At any given moment, each location is potentially characterized by a fixed amount of 
bytes. As such, the adversary determines the candidate locations by analyzing those loca¬ 
tions that have generated an amount of bytes similar to the entries of the user dataset. 
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Namely, suppose that A wants to determine the position of the user U at time to- A builds 
a filtered adversary knowledge base containing only the instances of the knowledge base 
such that their timestamps fall within the time frame [to — t, to]- The size of the time frame 
depends on the specific LBA and LBS taken into consideration, and on the typical behavior 
of the user. We assume that, within the targeted time frame, the user does not move from 
his location. The adversary restricts the number of possible locations studying the statistical 
distribution of the filtered adversary knowledge base and the user dataset. 

In the experiments, we will consider also the case of time-misalignment between the fil¬ 
tered adversary knowledge base and user dataset. This case is useful when the adversary is 
not able to collect data during the time frame [to — t, to] - hr such a case, the adversary may 
use a different time frame [to — t — S, to — <5], for a given <5 > 0. 

The adversary may use a statistical distance measure to quantify the distance between 
the samples of the user dataset, and the samples of the filtered adversary knowledge base, 
for each possible location. This will allow A to settle on a list of candidate locations where 
the user was at time t-o- 

Several statistical distance measures can be used for this purpose. However, in the exper¬ 
imental section, we will show that our approach is accurate even when using a very simple 
distance metric. Let us indicate with x the user dataset, with y the filtered adversary knowl¬ 
edge base, and with C the set of all possible locations. Furthermore, with the notation y[lj] 
we refer to the subset of the adversary knowledge base y related to an individual location 
k £ C. In other words, y[li] contains all the instances of the filtered adversary knowledge 
base such that the field LocID is equal to U.. Then the adversary will select a candidate 
location set S of size k in the following way: 

5CC E d(x,y[k]) , (1) 

\ S y =k iiGS 


where d(x,y[li\) indicates some statistical distance measure between x and y[h]. In the 
experiments, we used the following definition of distance: d(x,y[lj\) = \m(x) — m(y[li])\, 
where m(-) is the median function. Once the size k is fixed, Equation 1 allows to select a 
candidate location set S of size k, composed of the locations U that minimize the overall 
distance between x and y[U]- 

4 Experiments and Results 

To prove the feasibility and the accuracy of our approach, we performed a thorough analysis 
of one of the most popular and advanced LBAs: GoogleNow. GoogleNow is an application 
provided by Google which comes preinstalled with the vast majority of Android devices [14]. 
It is also available on iOS, Google Glass and even on Android Wear devices. GoogleNow 
is not only a LBA, but it acts also as a personal assistant by providing personalized infor¬ 
mation to the user. User-based and location-based information are sent together within the 
same encrypted traffic. However, we will show that the use of encrypted communications 
does not hinder the process of identifying user locations as long as some of the GoogleNow 
user’s preferences are known a priori. 

GoogleNow app operates regardless of the interactions with its users, and independently 
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determines when information should be downloaded from the LBS server [13] (unless a re¬ 
fresh is forced by the user). We can therefore speculate that GoogleNow app periodically 
sends the GPS location of the user to the LBS (Google servers). The LBS then replies 
with the information related to the GPS position sent by the GoogleNow app (e.g., nearby 
restaurants, bus stops, and/or images of the location). We confirmed this by installing the 
GoogleNow app on an Android X86 Virtual Machine (VM) and extracting the data ex¬ 
changed between our VM and Google servers via a man-in-the-middle proxy. This experiment 
confirmed that the GoogleNow app sends the GPS coordinates of the user but also much 
more data, including the information currently displayed by the app on the smartphone. 

4.1 Data Collection of GoogleNow Data 

We performed a location spoofing attack to collect the encrypted network traffic exchanged 
by the GoogleNow app and the Google servers, and to build the adversary knowledge base as 
described in Sect. 3.1. Each spoofed GPS user location corresponds to a point the adversary 
is interested in monitoring. We used the mitm-proxy software for this task [1] configured 
as a transparent proxy on a network of Androidx86 Virtual Machines run in a Virtualbox 
hypervisor on a Linux host. Each VM has been configured with a different user account. 
We configured different preferences for each account to determine to what extent Google 
personalizes the data sent to each user. Because of space restrictions, we report on the results 
achieved for a single configuration of the user preferences, that is the one where the user 
settled its home and work locations only. Similar results were achieved for the other accounts 
that we analyzed and will be reported in the full version of this paper. 

GPS locations exchanged between the app and the server are encrypted within SSL, thus 
we performed a man-in-the-middle attack. To this aim, a self-signed certificate was copied 
into each VM and added to the Android’s root certificates. In this way, the SSL traffic 
was decrypted and we found out that GoogleNow uses a protocol called Protobuf as data 
interchange format [2]. This protocol was designed by Google to be smaller and faster than 
XML. By analyzing the data structure of the protobuf messages that were intercepted by 
the mitm-proxy, we were able to identify the fields that contain the latitude and longitude of 
the user. These fields are sent to the Google servers in all the HTTPS requests that contains 
the following string within the URL: “tg/fe/request”. 

We collected data about an area of two square kilometers positioned in the center of a 
large European city. The adversary simulated the presence of its dummy users in the area by 
moving them on a grid of 5 x 10 points (50 different LocID in total). The location (i,j) of the 
grid is identified by the label i_j. Sect. 4.4 provides the motivation behind the selection of 
these parameters. In order to collect a large amount of data in a short time range, rather than 
changing the user position of the VM through a MockLocationProvider and then wait for a 
GoogleNow genuine request, the adversary replayed a request containing the “tg/fe/request” 
string in the URL several times, modifying the actual position with every point of the grid. 
The network traffic intercepted by the proxy has been used to feed the knowledge base of 
the adversary. 

For the experiments we had to simulate the target users as well. Therefore, during the 
same period of three weeks, different VMs have been configured with distinct user accounts. 
Users moved into the monitored area, and their respective traffic was collected and stored 
separately. Clearly, this last step will not be performed when tracing real users. 
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Fig. 2: Probability Distribution of the Bytes exchanged between the GoogleNow app and the 
Google servers. 


4.2 Exploratory Data Analysis 


In this section, we analyze the collected data to improve our understanding of the GoogleNow 
traffic dataset. This exploratory data analysis is performed over the data collected for an 
account that has been configured by only specifying home and work locations. All the re¬ 
maining user profiles that we analyzed show a very similar behavior. Figure 2 reports on the 
probability distribution of the bytes per SSL session exchanged between the GoogleNow app 
and the Google servers during the entire monitored period. On average, 32,604 bytes were 
exchanged per session, with a standard deviation of 7,518. The minimum recorded value is 
equal to 80, while the maximum is equal to 83,831 bytes. The median is 31,804 bytes, while 
lower and upper quartiles are equal to 27,791 and 36,520, respectively. 

To determine whether the bytes exchanged between the GoogleNow app and the Google 
servers might be useful to identify the actual location of the user, we analyzed the statistical 
distribution of the bytes exchanged in each monitored location. Figure 3 shows the boxplots 
diagram for all the 50 locations of the adversary knowledge base. The boxes extend from 
the lower to the upper quartile values of the data, with a line at the median. The whiskers 
extend from the boxes indicating variability outside the lower and upper quartiles. Figure 3a 
shows the statistical distribution of the bytes received in all the locations during the entire 
period of collection (3 weeks). Figure 3b was obtained while analyzing only one hour of 
traffic randomly selected among the three weeks. Note that in Fig. 3a, all the locations show 
a similar behavior. They have a mean value that is rarely greater than 40,000 and lesser 
than 20,000. The variance is very high, and lower and upper quartiles are quite far from the 
median value. With statistical distributions so similar to each other, it might be difficult 
to infer the actual location of a user. We quickly realized that the time of the day has a 
great influence on the information provided by GoogleNow, thus we limited the analysis to 
a period of time one hour long, randomly selected among the three weeks period. Figure 3b 
shows the boxplot diagram related to this subset of data. It can be observed that almost all 
the locations have a very tight variance. First and third quartile are very close to each other. 
Furthermore, once a particular size is selected, only a few locations may have produced it. 
This is the main reason that led us to consider the time as an important parameter in our 
analysis. 
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(a) Analysis of the entire adversary knowledge base (3-weeks of traffic). 
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(b) Analysis of only a random hour of the adversary knowledge base. 

Fig. 3: Statistical distribution of the bytes exchanged between the GoogleNow app and the 
Google servers per monitored location. 


Daily Pattern. During the exploratory data analysis, we realized that the amount of bytes 
exchanged between GoogleNow and the Google servers follows a daily pattern distribution. 
In particular, during daily hours it ranges from 26,000 to 32,000 bytes, whereas during the 
night it falls down in the interval between 22,000 and 24,000 bytes. In Sect. 4.3, we will show 
how this daily pattern influences the accuracy results. 

4.3 Accuracy Results 

In the following, we will report on the results of the tests that we performed on the collected 
dataset. All the experiments presented next represent the average of 10,000 tests. In each 
test, the adversary tries to infer the position of a user. The fc-identifiability of a tested user 
position is defined as 1 if the actual position of the user is within the set S of k candidate 
locations selected with the approach described in Sect. 3.2. Otherwise fc-identifiability is 
defined as 0. Thus, the k -accuracy is the average of the fc-identifiability values of each tested 
instance [17]. 

In Fig. 4, we show the fc-accuracy of a user location when varying k and t , where k is the 
size of the candidate locations set and t is the size of time frame used to filter the adversary 
knowledge base. Observe that for k = 8, we reach a value close to 95% with a time frame of 
only 20 minutes. The behavior of the fc-accuracy is asymptotic and it reaches a value close 
to the maximum already at around t = 20. In other words, the adversary has to analyze only 
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t (minutes) 

Fig. 4: Accuracy of Locations Sets: effect upon accuracy of varying k (size of the candidate 
locations set) and t (size of the time frame). 



Fig. 5: Accuracy of Locations Sets: Effect upon accuracy of varying S (time delay between 
the filtered adversary knowledge base and the user dataset), (fc — 4, t = 60) 


20 minutes of traffic to reach the best accuracy performance, independently of the value k 
selected. Other combinations of k and t also provide reasonable accuracy performance. For 
instance, for k = 8, 5 minutes of traffic are enough to reach an accuracy of 79%, which is 
quite remarkable. 

In Fig. 5, we show the effect of varying 5 , that is the difference of time between the 
filtered adversary knowledge base and the user dataset. In general, larger delays result in 
lower accuracy. However, the figure shows a cyclic behavior that reflects the daily activity 
highlighted in Sect. 4.2. Thus, if the adversary does not have in its knowledge base instances 
that fall in the same time frame of the user dataset, then it is better to use a filtered 
knowledge base that is one day older (1440 minutes) than one that is only 12 hours older 
(720 minutes). Indeed, in the former case the accuracy is slightly below 60%, while in the 
latter case it is around 12% only. The figure also shows a decrease of the peaks that are 
in correspondence of every 24 hours. This is mainly due to the fact that the information 
provided by the app is being constantly updated, and it becomes obsolete after a few days. 


4.4 Granularity of the Monitored Area 

In the experiments that we reported in Sect. 4, the adversary monitored 50 points that 
were distributed within an area of two square kilometers. Each point represented therefore 
a square area of 200 m 2 . The adversary arbitrarily chooses one point within this square 
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Fig. 6: Heat-matrices for different sizes of the cells 


as a representative. However, the granularity of 200 m 2 does not necessarily match the 
granularity adopted by GoogleNow: if the granularity of GoogleNow is finer, then there are 
points within the area that have not been considered during the collection phase; on the 
other hand, if the granularity of GoogleNow is coarser, then several different points may 
become indistinguishable, thus impeding detection. 

We performed an additional experiment to learn the granularity used by Google services 
in a given area. To this aim, we run the data collection phase using different granularities: 5, 
10, 25 and 100 to 2 . The number of points is fixed to 50 in all cases considered. Each square 
was probed once every 5 minutes and we calculated the median of the exchanged bytes. 
Figure 6 shows the results achieved for the two extreme values that we used: 5 and 100 to 2 . 
In Fig. 6a, it is possible to easily identify three different regions with a similar amount of 
bytes exchanged (they are indicated with Rl, i?2, i?3). In Fig. 6b, a cell represents an area 
of 100 to 2 . Even in this case there exist regions containing more than one indistinguishable 
square. The shapes of these regions are irregular since they depend on the location, on the 
importance, and on the number of the points of interests that fall in those areas. 

The result of this analysis is that the granularity of GoogleNow in the particular region we 
analyzed is dynamic and ranges from 5 to 2 to 100 to 2 . Thus the most appropriate granularity 
should be selected to find the right balance between performance (data collection) and 
detection accuracy. Our experiments were run with 200 to 2 in our target region because the 
detection accuracy was very high regardless and this allowed us to speed up data collection 
(the entire area could be monitored every 5 minutes) and avoid overloading GoogleNow 
servers with our requests. 


5 Related Work 

The closest research areas to this work, are traffic analysis and location obfuscation. In the 
following, both of them will be briefly described. 

Traffic Analysis is devoted to exploiting observable features in an encrypted traffic to 
infer information about the content of the communication. For instance, [9, 20] leverage 
observables such as the timing and the exchanged bytes to discover communication patterns 
that can be used to break the anonymity or the confidentiality of the communication. The 
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majority of the work in this area has been conducted over HTTPS protocol [17,16,19,18], 
even though other protocols such as VoIP [24] have been analyzed as well. A variety of 
techniques such as Naive Bayes classifiers, Jaccard’s coefficient [17], common text mining 
techniques applied to the normalized frequency distribution of observable IP packets [16] 
or vector machine classifier [19] have been adopted and applied to traffic analysis in order 
to identify which website the target user had accessed. These techniques work even if the 
communication is encrypted or anonymized through networks such as Tor. Indeed, [19] 
achieved an astonishing accuracy of 97% in these cases. Several countermeasures have also 
been devised [18,17, 25]. They work by manipulating packet size, web object size, flow size, 
and the timing of the packets to hinder traffic analysis. Unfortunately, these countermeasures 
have significant performance drawbacks [17] and some of them are particularly vulnerable to 
simple attacks [12] that exploit the coarse features of traffic (e.g., total time and bandwidth). 

Location Obfuscation aims at hiding the exact geographical location of the user from 
the LBSs. Considering both location information accuracy and privacy, [8] introduces the 
concept of relevance which is used to protect the location information of users together with 
other obfuscation operators. In [15], the authors propose to adjust the resolution of location 
information along spatial or temporal dimensions to meet specific anonymity constraints. 
In [10], a peer-to-peer method is proposed where users cooperate to hide their real location 
from the LBS. Another approach proposed in [7] consists of adding controlled noise to 
the user’s location to obtain an approximate version of it which is then sent to the LBS. 
An obfuscation technique is proposed in [21] for a number of sensitive data, including IP 
addresses of users, this tecnique provides formal confidentiality guarantees under realistic 
assumptions about the adversary’s knowledge. 

6 Conclusions 

In this paper, we introduced a new adversary model for Location Based Services. The model 
takes into account an unauthorized third party, different from the LBS provider itself, that 
wants to infer the position of a LBS user. We analyzed one of the most popular location 
based apps available on Android, that is GoogleNow, and we shown that our adversary can 
infer the position of a user with an accuracy of almost 90% through a statistical analysis of 
the user’s encrypted network traffic. 
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